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Abstract 

This  paper  addresses  the  following  version  of  the  multi¬ 
ple  range  scan  registration  problem.  A  scanner  with  an  as¬ 
sociated  intensity  camera  is  placed  at  a  series  of  locations 
throughout  a  large  environment;  scans  are  acquired  at  each 
location.  The  problem  is  to  decide  automatically  which 
scans  overlap  and  to  estimate  the  parameters  of  the  trans¬ 
formations  aligning  these  scans.  Our  technique  is  based 
on  (1)  detecting  and  matching  keypoints  —  distinctive  loca¬ 
tions  in  range  and  intensity  images,  (2)  generating  and  re¬ 
fining  a  transformation  estimate  from  each  keypoint  match, 
and  (3)  deciding  if  a  given  refined  estimate  is  correct. 

While  these  steps  are  familiar,  we  present  novel  ap¬ 
proaches  to  each.  A  new  range  keypoint  technique  is  pre¬ 
sented  that  uses  spin  images  to  describe  holes  in  smooth 
surfaces.  Intensity  keypoints  are  detected  using  multiscale 
filters,  described  using  intensity  gradient  histograms,  and 
backprojected  to  form  3D  keypoints.  A  hypothesized  trans¬ 
formation  is  generated  by  matching  a  single  keypoint  from 
one  scan  to  a  single  keypoint  from  another,  and  is  refined 
using  a  robust  form  of  the  ICP  algorithm  in  combination 
with  controlled  region  growing.  Deciding  whether  a  refined 
transformation  is  correct  is  based  on  three  criteria:  align¬ 
ment  accuracy,  visibility,  and  a  novel  randomness  measure. 
Together  these  three  steps  produce  good  results  in  test  scans 
of  the  Rensselaer  campus. 


1.  Introduction 

This  paper  addresses  a  new  form  of  the  multiple  range 
image  registration  problem.  The  data  are  range  scans  taken 
from  a  variety  of  positions  throughout  a  large  area  —  a  uni¬ 
versity  campus  is  our  test  case.  These  scans  might  overlap 
by  a  little  bit,  by  a  large  fraction,  or  not  at  all.  The  scans 
are  not  taken  incrementally,  and  there  will  be  at  most  a  few 


scans  of  any  region.  Challenges  include  handling  a  vari¬ 
ety  of  structures  (such  as  buildings  and  landscapes),  moving 
objects,  changes  in  illumination,  and  significant  occlusions. 
The  goal  is  to  determine  which  scans  overlap  and  to  com¬ 
pute  the  transformations  that  best  align  overlapping  scans. 
We  approach  this  goal  as  a  “location  recognition  problem” 
because  the  major  issue  is  a  decision  about  which  scans 
show  part  of  the  same  environment. 

Previous  algorithms  have  approached  multiple  range  im¬ 
age  registration  by  assuming  that  either  coarse  initial  align¬ 
ments  are  available,  or  that  the  scans  are  taken  of  static 
scenes  and  have  substantial  overlap  between  them.  The  al¬ 
gorithms  we  develop  should  certainly  work  when  such  as¬ 
sumptions  are  met,  but  our  goal  is  to  place  no  restrictions  on 
the  relative  placement  of  the  scans.  Similarly,  GPS  and/or 
inertial  navigation  units  may  be  available  to  provide  ini¬ 
tial  location  estimates,  but  these  systems  sometimes  fail. 
The  techniques  we  develop  here  can  help  detect  and  cor¬ 
rect  these  failures.  Thus,  solutions  to  the  location  recogni¬ 
tion  problem  will  make  3D  environment  modeling  systems 
more  autonomous,  more  flexible,  and  more  robust. 

We  present  our  initial  work  toward  solving  this  problem. 
Our  overall  approach  is  built  on  three  algorithmic  compo¬ 
nents:  hypothesis  generation,  refinement,  and  verification. 
Hypothesis  generation  is  the  creation  of  initial  transforma¬ 
tion  estimates  through  matching  of  keypoints  —  locations 
of  distinctive  structure  in  range  scans  or  in  the  associated 
intensity  images.  We  introduce  a  new  range  keypoint  ex¬ 
traction  technique,  and  we  adapt  image  keypoint  extraction 
techniques  from  the  computer  vision  literature.  Each  pair  of 
scans  will  lead  to  the  creation  of  many  keypoint  matches. 

A  hypothesized  3D  rigid  transformation  between  a  pair 
of  range  scans  is  generated  for  each  keypoint  match  indi¬ 
vidually,  even  for  intensity-image  keypoint  matches.  Cre¬ 
ating  an  estimate  from  each  individual  match  avoids  the 
need  for  combinatorial  search  methods  or  clustering  tech¬ 
niques.  This  is  important  because  particularly  difficult  data 
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sets  have  relatively  few  correct  matches.  Since  the  initial  es¬ 
timate  generated  from  a  keypoint  match  can  only  be  trusted 
in  a  small  region  of  the  data  surrounding  the  keypoint  in 
each  scan  —  even  if  the  correspondence  is  correct  —  the  re¬ 
finement  technique  alternates  steps  of  (1)  re-estimating  the 
transformation  using  a  robust  form  of  the  iterative  closest 
points  (ICP)  algorithm  applied  within  the  region,  and  (2) 
expanding  the  region.  This  allows  inaccurate  initial  esti¬ 
mates  to  be  locally  refined  before  being  applied  to  entire 
data  sets. 

The  final  component,  verification,  decides  if  a  given  re¬ 
fined  transformation  is  correct  and  therefore  if  the  location 
seen  in  one  scan  has  been  recognized  in  the  other.  This  step 
is  crucial  in  our  approach  because  we  make  no  attempt  to 
pre-filter  matches  other  than  by  the  match  similarity  mea¬ 
sure.  Similar  to  other  techniques  [12],  our  decision  criteria 
include  measurements  of  the  alignment  accuracy  and  the 
fraction  of  visible  surface  points  matched.  Building  from 
[4],  we  include  a  novel  third  measure  designed  to  determine 
if  the  alignment  is  unambiguous.  This  new  measure  is  based 
on  regenerating  keypoints  using  the  estimated  transforma¬ 
tion.  The  final  decision  criterion  is  based  on  a  combination 
of  these  accuracy,  coverage  and  ambiguity  measures. 

2.  Related  Work 

Several  papers  have  addressed  the  problem  of  multiscan 
registration  for  building  3D  models.  In  earlier  work  for 
modeling  single  objects,  many  overlapping  scans  are  ac¬ 
quired  and  the  main  problem  is  high-precision  registration 
starting  from  reasonable  initial  estimates  (see  e.g.  [1,  13]). 
Some  recent  work  has  focused  on  modeling  outdoor  scenes, 
either  single  buildings  or  larger-scale  areas.  In  [21,  22], 
registration  is  initialized  by  matching  3D  line  segments  be¬ 
tween  scans,  and  clustering  matched  line  segments  to  de¬ 
termine  which  scans  overlap.  In  [8],  data  are  obtained  con¬ 
tinually  from  a  scanner  mounted  on  a  moving  vehicle;  air¬ 
borne  imagery  is  used  to  guide  the  alignment  of  these  scans 
into  a  complete  model.  Recent  multiscan  registration  work 
[12]  has  used  spin  images  [14]  to  initialize  registration,  a 
graph-based  multiscan  refinement  algorithm,  and  a  verifi¬ 
cation  step  based  on  a  combination  of  accuracy  and  visibil¬ 
ity.  While  this  work  has  an  underlying  assumption  of  static 
scenes  and  substantial  overlap  between  scans,  several  of  the 
techniques  developed  are  applied  and  extended  in  our  work. 

Initialization  methods  have  been  the  focus  of  substan¬ 
tial  effort  in  recognition  and  in  registration,  using  both  2D 
images  and  3D  range  scans.  For  3D  scans,  early  work  fo¬ 
cused  on  detecting  and  matching  corners  and  points  of  high 
curvature  [5].  More  recent  work  has  focused  on  the  use 
of  spin  images  and  related  representations  as  summary  de¬ 
scriptions  (sometimes  called  “point  descriptors”)  for  match¬ 
ing  [7] .  Work  on  intensity  images  has  emphasized  both  the 


detection  and  description  of  point  locations.  Detection  algo¬ 
rithms  have  used  the  multiscale  Laplacian  of  Gaussian  [16], 
and  multiscale  corner  detection  [18],  among  others.  While 
a  host  of  descriptors  has  been  proposed,  affine-invariant 
versions  of  Lowe’s  SIFT  descriptor,  which  is  a  scale-  and 
contrast-normalized  histogram  of  intensity-gradient,  and 
steerable  filters  have  proven  most  effective  [17].  Finally, 
researchers  have  combined  both  intensity  and  range  data  in 
matching,  focusing  on  filtering  matches  [19]  or  using  range 
data  to  normalize  keypoint  descriptions  [23].  Both  have  re¬ 
ported  preliminary  results  on  alignment  of  single  objects. 

Work  in  refinement  algorithms  for  range  scan  registra¬ 
tion  has  focused  on  the  ICP  algorithm,  which  alternately 
generates  temporary  correspondences  based  on  an  esti¬ 
mated  transformation  and  refines  the  transformation  esti¬ 
mate  based  on  the  correspondences.  ICP  was  proposed  al¬ 
most  simultaneously  in  several  papers  in  the  early  1990s 
(e.g.  see  [2,  6]),  and  has  been  studied  extensively  since  [20]. 

Automatic  verification  —  the  problem  of  determining 
whether  an  alignment  between  a  pair  of  scans  or  images  is 
correct  —  has  received  less  attention.  Most  techniques  are 
simply  based  on  the  accuracy  of  the  alignment.  More  recent 
work  has  included  tests  based  on  visibility  and  occlusion 
[12],  with  the  use  of  a  Bayesian  classifier  to  formulate  the 
decision  criteria  [11].  In  image  registration,  recent  keypoint 
matching  algorithms  [4]  use  a  probabilistic  argument  to  de¬ 
termine  the  minimum  number  of  “good”  keypoint  matches 
required  for  an  alignment  to  be  considered  correct.  We  build 
on  all  of  these  techniques  in  our  decision  criteria. 

3.  Sensor  and  Data  Collection 

All  data  for  the  experiments  in  this  paper  were  collected 
around  the  Rensselaer  campus  using  a  Leica  HDS  3000 
scanner  (Figure  1).  Bore-sighted  RGB  images  are  acquired 
by  the  scanner  prior  to  the  start  of  each  scan.  The  images  are 
well-calibrated  relative  to  the  scanner  position,  so  that  the 
lines  of  sight  of  the  range  points  and  image  points  coincide. 
Our  experiments  have  verified  that  errors  in  the  alignment 
are  less  than  0.1°.  The  data  include  both  buildings  and  nat¬ 
ural  scenes.  In  some  cases  a  building  that  is  clearly  visible 
in  one  scan  is  significantly  occluded  in  another. 

4.  Keypoint  Extraction  and  Matching 

Keypoints  are  distinctive  image  locations  that  can  be  de¬ 
scribed  in  an  approximately  viewpoint  independent  manner. 
They  have  been  used  for  both  object  recognition  and  regis¬ 
tration.  Our  work  on  the  location  recognition  problem  uses 
them  as  both  the  basis  for  hypothesis  generation  and  as  part 
of  the  hypothesis  validation  procedure.  We  propose  two 
keypoint  techniques,  one  based  solely  on  the  range  data  and 


Figure  1.  Three  example  range  scans  taken  on  the  Rennsselaer  campus  illustrating  some  of  the 
challenges  of  the  location  recognition  problem,  including  occlusions,  viewpoint  variations,  and 
different  types  of  structures  varying  from  buildings  to  trees  and  hillsides  (right).  The  left  and  center 
pair  has  an  extremely  small  overlap  region,  with  the  farthest-away  building  at  the  top  of  the  center 
scan  being  the  main  building  in  the  left  scan. 


one  based  primarily  on  intensity  images.  Range  keypoints 
summarize  local  3d  structure  at  distinctive  locations  in  the 
range  scan,  while  intensity  keypoints  summarize  local  in¬ 
tensity  structure  at  distinctive  locations  in  the  intensity  data. 
These  therefore  offer  independent,  almost  complementary, 
approaches  to  matching  and  hypothesis  generation. 

4.1.  Range  Keypoints  and  Matching 

Our  approach  to  developing  range  keypoints  starts  from 
the  idea  of  a  spin  image  [14],  a  2D  histogram  of  range  points 
computed  in  a  cylindrical  coordinate  system  centered  at  a 
point  of  interest.  The  cylindrical  axis  may  be  oriented  along 
an  estimated  surface  normal  or  along  the  axis  of  least  sec¬ 
ond  moment.  Integrating  around  the  rotation  axis  condenses 
the  coordinate  system  from  3D  to  a  2D  histogram  (hence 
the  term  “spin  image”).  Spin  images  are  descriptors,  not 
location  detectors,  and  therefore  must  be  combined  with  a 
location  detection  technique.  Earlier  work  used  all  the  data 
or  a  uniform  sampling  of  locations  [7,  12,  14].  We  present 
an  alternative  approach  designed  to  handle  widely  varying 
viewpoints,  low  image  overlaps,  and  occlusions. 

We  detect  locations  at  which  to  compute  keypoints  by 
finding  holes  in  smooth  surfaces.  The  spin  image  coordi¬ 
nate  system  is  centered  on  the  center  of  mass  of  the  hole 
boundary,  and  the  spin  image  is  computed  over  a  region 
whose  size  is  a  small  constant  multiple  of  the  diameter  of 
the  hole  (see  Figure  2).  In  this  way  both  the  shape  of  the 
hole  and  the  geometric  structure  of  the  surrounding  region 
are  combined  in  the  spin  image  descriptor.  Holes  in  man¬ 
made  structures  occur  primarily  at  doors,  windows,  and  in¬ 
dentations  or  outcroppings  on  the  faces  of  buildings.  There 
are  not  significant  self-occlusions  in  the  regions  immedi¬ 
ately  surrounding  the  holes  as  there  would  be  near  more 
substantial  depth  discontinuities,  so  the  computed  spin  im¬ 


age  should  not  change  substantially  with  viewpoint  (pro¬ 
vided  the  spin-image  calculation  accommodates  sampling 
differences).  One  major  concern  with  using  holes  is  that 
they  represent  repetitive  structures;  e.g.  different  windows 
on  the  same  building  often  look  the  same.  We  accommo¬ 
date  this  in  part  by  expanding  the  size  of  the  spin  image 
region  so  that  surrounding  structure  will  make  the  spin  im¬ 
age  descriptors  more  distinctive.  We  also  account  for  this  in 
our  matching  and  decision  criteria,  described  below. 

Clearly,  the  detection  of  holes  requires  segmentation,  but 
segmentation  of  range  images  is  still  an  unsolved  problem 
[10].  Fortunately,  we  are  more  concerned  with  reliably,  re- 
peatably  extracting  the  same  region  in  different  scans,  rather 
than  computing  a  segmentation  that  agrees  with  human  per¬ 
ception.  Moreover,  our  algorithm  needs  only  one  success¬ 
fully  matched  hole  in  order  to  succeed,  relaxing  constraints 
on  the  robustness  of  the  detection  algorithm. 

The  segmentation  technique  itself  extracts  nearly  planar 
surface  regions  using  a  technique  similar  to  [3].  Robust 
estimates  of  local  surface  normals  are  first  computed,  and 
then  seed  regions  are  found  where  the  smallest  eigenvalue 
of  the  local  covariance  matrix  is  sufficiently  small.  These 
seeds  are  grown  into  surface  patches  and  clustered  based  on 
similarity  in  surface  normals  and  locations.  Growth  is  pri¬ 
oritized  so  that  a  surface  grows  fastest  where  it  is  flattest. 
At  the  end,  all  clusters  with  fewer  than  30  data  points  are 
eliminated.  Sufficiently  large  regions  of  “missing  pixels”  in 
the  planar  patches  are  detected  as  holes.  The  missing  re¬ 
gions  need  not  be  completely  enclosed  by  the  patch  (e.g.  a 
doorway  at  the  base  of  a  building).  Typically  5-15  holes  are 
extracted  from  range  scans  of  buildings  (Figure  2). 

Spin  images  are  computed  by  placing  a  coordinate  ori¬ 
gin  at  the  center  of  the  hole  and  orienting  the  axis  of  the 
spin  cylinder  along  the  surface  normal  of  the  surrounding 
surface.  The  contribution  of  each  point  is  weighted  by  the 


Figure  2.  Automatically  detected  range  key- 
points,  i.e.  holes  in  planar  surfaces  where 
spin  images  are  computed.  Black  pixels  rep¬ 
resent  points  on  planar  surfaces.  In  this  ex¬ 
ample  all  significant  holes  are  found.  The  cir¬ 
cles  represent  the  area  over  which  the  spin 
images  are  computed. 


inverse  of  an  estimate  of  the  local  sampling  density,  similar 
to  [7].  Spin  images  are  matched  by  computing  the  cosine 
of  the  angle  between  the  vectors  formed  by  each  spin  image 
histogram,  giving  a  similarity  measure  in  the  interval  [0,1]. 


Figure  3.  Intensity  keypoints  before  (left)  and 
after  (right)  range  image  filtering  to  eliminate 
keypoints  near  depth  discontinuities. 


tors  from  the  fixed  image  are  found.  Let  S(p,  q)  denote  the 
similarity  score  between  the  descriptors  associated  with  p 
and  q.  Then,  if 


ff(p»qt,o)  n 
s(pii  q»,t) 


(i) 


for  some  0  <  1,  the  match  (p^q^o)  is  accepted  as  suffi¬ 
ciently  unique.  This  ratio  test  is  applied  to  the  best  match 
for  each  keypoint  p^,  and  the  surviving  matches  are  sorted 
by  increasing  order  of  the  ratio  (1). 


4.2  Intensity  Keypoints  and  Matching 


5.  Refinement 


Following  [16],  intensity  keypoints  are  found  by  detect¬ 
ing  peaks  in  the  response  of  the  Difference-of-Gaussians 
operator  in  the  bore- sighted  intensity  images.  Each  key- 
point  has  a  location,  orientation,  scale,  and  descriptor.  We 
use  Lowe’s  SIFT  descriptor  [16],  a  128-component  his¬ 
togram  of  normalized  gradients,  which  has  proven  to  be  ef¬ 
fective  in  experimental  evaluations  [17]. 

Equally- spaced  samples  on  the  perimeter  of  the  scale- 
neighborhood  around  each  2D  keypoint  are  backprojected 
along  its  camera’s  line-of-sight  to  the  range  surface  and 
are  used  for  three  purposes:  (1)  If  the  change  in  depth  be¬ 
tween  two  consecutive  samples  is  greater  than  some  frac¬ 
tion  (0.7)  of  the  total  variation  in  depth  around  the  circle, 
then  a  depth  discontinuity  is  identified  and  the  keypoint  is 
discarded  (Figure  3).  (2)  The  samples’  distances  from  the 
3D  keypoint  center  are  used  to  estimate  a  3D  scale.  (3)  The 
samples  are  used  to  estimate  a  local  plane.  The  normal  to 
this  plane  becomes  the  z  axis  of  a  local  3D  coordinate  sys¬ 
tem  with  its  origin  at  the  keypoint  center.  The  x  axis  is 
computed  by  back-projecting  the  2D  keypoint  orientation 
onto  the  local  plane.  A  right-handed  coordinate  system  is 
completed  with  y  =  z  x  x. 

Matching  these  3D  keypoints  follows  the  standard  meth¬ 
ods  in  the  literature.  For  each  moving  scan  keypoint  pt  the 
two  keypoints  q^o  and  q^i  with  the  best  matching  descrip- 


At  the  refinement  stage  we  are  given  a  set  of  range  and 
intensity  keypoint  correspondences.  Each  correspondence 
matches  a  3D  point  from  each  of  two  different  scans.  The 
issue  is  how  to  turn  these  correspondences  into  hypothe¬ 
sized  and  refined  transformation  estimates.  There  are  two 
novelties  to  our  refinement  procedure: 

•  Initialization  from  a  single  match 

•  A  region  growing  variant  of  ICP 

Instead  of  attempting  to  combine  matches  to  estimate 
the  transformation,  an  initial  transformation  between  scans 
is  computed  for  each  match  and  then  refined.  Other  tech¬ 
niques  in  the  literature  search  for  subsets  of  consistent 
matches  to  generate  and  evaluate  the  transformation  esti¬ 
mate.  This  involves  either  a  combinatorial  search  or  cluster¬ 
ing  in  parameter  space.  Both  become  increasingly  problem¬ 
atic  as  the  fraction  of  correct  keypoint  matches  decreases 
(e.g.  due  to  occlusions  or  differences  in  viewpoint).  We 
don’t  want  to  rely  on  having  a  sufficient  set  of  correct  corre¬ 
spondences,  either  in  percentage  or  absolute  numbers;  such 
a  reliance  limits  the  overall  robustness  of  the  algorithm.  In¬ 
stead  we  rely  on  rapid  ICP  refinement  and  a  decision  pro¬ 
cedure  that  can  reliably  determine  whether  or  not  an  align¬ 
ment  between  scans  is  correct.  We  will  therefore  be  able 


to  test  hypothesized  transformations  one-by-one  and  stop 
when  one  has  been  generated  that  is  correct. 

Computing  the  initial  rigid  transformation  from  a  match 
between  one  keypoint  from  each  of  two  scans  is  simple  if 
a  3D  coordinate  system  has  been  established  at  each  key- 
point.  For  an  intensity  keypoint,  this  coordinate  system  is 
established  when  the  keypoint  is  backprojected  into  3D.  For 
range  keypoints,  establishing  the  local  coordinate  system  is 
straightforward.  The  local  surface  normal  becomes  the  2 
axis.  A  second  axis  can  either  be  computed  from  the  mo¬ 
ments  of  the  hole  boundary  points  or  from  the  gravity  di¬ 
rection  obtained  from  the  scanner.  Projecting  this  onto  the 
plane  normal  to  2  produces  the  x  axis.  The  y  direction  is 
the  cross-product  z  x  x. 

A  region  growing  variant  of  the  ICP  algorithm  addresses 
a  problem  caused  by  using  a  single  correspondence  to  ini¬ 
tialize  the  transformation  estimate.  The  initial  estimate  is 
often  only  accurate  in  the  region  of  the  two  data  sets  im¬ 
mediately  surrounding  the  correspondences  (see  Figure  4). 
This  is  especially  likely  for  intensity  keypoints  because  they 
are  more  local  than  range  keypoints.  Applying  ICP  through¬ 
out  the  data  set  starting  from  this  initial  estimate  can  lead  to 
mismatches  and  incorrect  convergence,  even  if  the  corre¬ 
spondence  is  correct.  Our  solution  is  to  refine  the  estimate 
in  a  small  region  surrounding  the  corresponding  points  in 
each  range  scan,  and  then  double  the  radius  of  the  region 
and  repeat  the  process. 

Aside  from  this  region  growing,  the  actual  ICP  proce¬ 
dure  is  straightforward.  Normal  distance  constraints  are 
used  [6,  20].  Matches  are  robustly  weighted  using  the 
Cauchy  M-estimator  weight  function  [9],  designed  for  a 
gradual  downweighting  of  outliers.  The  robust  standard  de¬ 
viation  is  estimated  for  the  first  ICP  iteration  in  each  re¬ 
gion.  While  we  could  easily  incorporate  intensity  measures 
into  the  ICP  refinement  process  [15],  we  currently  reserve 
the  use  of  intensity  constraints  for  the  decision-making  step. 
Overall,  the  algorithm  tends  to  converge  quickly. 

6.  Verification  Criteria 

The  final  step,  verification,  requires  decision  criteria  to 
determine  if  a  refined  transformation  is  correct.  We  are 
interested  in  an  absolute  measure  of  correctness:  one  that 
can  be  applied  to  a  single  transformation  estimate  rather  re¬ 
quiring  a  comparison  between  different  estimates.  This  al¬ 
lows  transformation  estimates  to  be  evaluated  one-by-one 
using  a  greedy  approach,  but  requires  a  rich  measure  that 
clearly  separates  good  transformation  estimates  from  poor 
ones.  We  combine  three  measures: 

•  Accuracy  of  the  alignment 

•  Visibility  constraints 

•  A  non-randomness  score 


In  the  following,  we  denote  I\  as  the  moving  scan,  and  J2 
as  the  fixed  scan. 

The  accuracy  measure  is  a  robust  mean- square  error  of 
the  transformation  estimate.  If  {x^}  is  the  set  of  points  in 
/1,  {y^}  is  the  set  of  closest  points  in  J2  with  associated 
normals  {77*},  and  a  is  the  set  of  estimated  transformation 
parameters,  then  the  accuracy  measure  is 

a2  =  u>i[(T(xi;  a)  -  y*)  •  ^  wh  (2) 

for  robust  weights  This  measure  comes  directly  from 
the  ICP  process.  For  a  correct  transformation  estimate,  the 
accuracy  must  be  close  to  the  approximate  noise  of  the  sen¬ 
sor.  This  requirement  is  not  sufficient,  however.  Alignment 
of  repetitive  structures  and  accidental  alignment  of  subsets 
of  the  scene  can  make  incorrect  alignments  seem  accurate, 
whereas  changes  in  the  scene  can  make  correct  alignments 
seem  inaccurate. 

The  visibility  measure  is  designed  to  address  some  of 
these  problems.  This  measure  gives  the  fraction  of  points 
that  appear  to  be  incorrect  when  I\  is  mapped  onto  /2 •  If 
the  scene  is  static,  then  the  mapping  of  a  range  point  x  from 
1 1  into  the  coordinate  system  of  /2  should  not  occlude  any 
points  from  J2  from  the  perspective  of  the  scanner  —  oth¬ 
erwise  the  location  in  the  scene  corresponding  to  x  would 
have  been  imaged  in  scan  J2 .  In  theory,  only  scene  changes 
and  object  motion  between  the  acquisition  of  the  two  scans 
can  cause  a  visibility  violation  in  a  correct  alignment. 

Assume  the  range  scan  is  represented  as  a  depth  image 
/2('u,u).  Each  point  x*  from  Ii  is  mapped  into  /2.  If  the 
mapped  point  is  within  the  field  of  view  of  the  scanner, 
let  Ui ,  Vi  be  its  image  coordinates  in  J2  and  Zi  be  its  depth 
value.  If  Zi  <  l2(ui,Vi)  —  ca ,  where  ca  is  a  small  constant 
multiplier  on  the  mean- square  error,  then  we  say  that  point 
i  violates  the  visibility  constraint.  The  visibility  error  mea¬ 
sure  from  1 1  to  I 2  is  the  fraction  of  the  mapped  points  within 
the  field  of  view  of  J2  that  violate  the  visibility  constraint.  A 
second  visibility  measure  is  computed  by  reversing  the  roles 
of  1 1  and  I 2  in  the  foregoing.  The  final  visibility  measure  is 
the  maximum  of  these  measures. 

When  there  are  changes  in  the  scene,  low  overlap,  or 
repetitive  structures,  even  the  visibility  measure  may  not 
be  enough  to  determine  if  an  alignment  is  correct.  A  third 
measure  is  needed  that  evaluates  the  hypothesized  transfor¬ 
mation  to  determine  if  it  represents  a  random  alignment  or 
a  misalignment  of  a  repetitive  structure. 

The  non-randomness  measure  places  keypoints  in  a  cen¬ 
tral  role.  Each  keypoint  represents  a  somewhat  distinctive 
structure  in  the  image.  The  search  for  keypoint  matches  is 
in  effect  a  search  over  a  variety  of  possible  transformations 
for  the  keypoint.  Hence,  if  the  transformation  maps  the  key- 
point  onto  its  best  match,  then  this  is  local  evidence  that 
the  transformation  is  unique  and  non-random  [4].  Trans- 


Figure  4.  Refining  a  transformation  based  on  a  single  keypoint  match.  The  pair  is  the  left  two  scans 
from  Figure  1.  Only  1%  of  the  moving  scan  and  11%  of  the  fixed  scan  are  in  the  overlap  region.  Left: 
an  initial  alignment  based  on  a  keypoint  in  the  region  circled  in  black.  Points  in  white  are  from  the 
moving  scan.  Note  the  substantial  misalignment  indicated  by  the  black  arrow.  Right:  the  refined 
transformation  estimate.  The  misalignment  is  corrected  by  our  refinement  procedure  producing  a 
final  robust  error  of  4mm. 


formations  consistent  with  many  keypoint  matches  are  ex¬ 
tremely  unlikely  to  be  wrong.  An  important  challenge  to 
using  this  observation,  however,  is  that  many  original  key- 
points  may  not  be  consistently  detected  in  the  two  scans, 
especially  when  the  viewpoints  differ  substantially.  To  ad¬ 
dress  this,  keypoint  matches  that  are  inconsistent  with  their 
best  initial  match  are  rematched  using  the  estimated  trans¬ 
formation.  If  the  new  match  is  better  than  the  initial  match 
for  this  keypoint,  then  this  is  additional  evidence  that  the 
transformation  is  non-random  and  therefore  correct. 

To  be  more  precise,  let  M  =  {(Pi,qi)}  be  the  set  of 
originally  matched  keypoints.  Exactly  one  of  these  key- 
points  was  used  to  initialize  the  transformation  being  tested. 
For  each  p^,  let  p-  =  T(p*;  a)  be  the  mapping  of  p^  based 
on  the  estimated  transformation.  For  intensity  keypoints 
this  keypoint  mapping  includes  a  projection  from  3D  to  2D 
based  on  camera  calibration  parameters.  We  call  (pi,p-) 
a  “transform  match”.  Correspondences  such  that  p-  is  not 
visible  in  the  fixed  scan  are  removed.  Let  M!  be  the  re¬ 
sulting  reduced  set  of  matches.  For  each  p^,  if  ||p-  —  q^|| 
is  less  than  a  small  threshold  that  depends  on  the  scale  of 
the  original  keypoint,  then  the  original  match  is  considered 
consistent  with  the  transformation.  Otherwise,  we  compute 
the  similarity  score  S^p^p-)  for  the  transform  match  us¬ 
ing  the  scale  from  p*  for  p'.  If  5'(pi,  pJ)/S(p$,  q^)  passes 
the  ratio  test,  then  the  transform  match  is  considered  con¬ 
sistent;  it  is  significantly  better  than  all  original  matches  for 
p*.  The  fraction  of  matches  from  M'  that  are  either  orig¬ 
inally  consistent  or  transform  consistent  is  the  used  as  the 
“randomness  measure”. 

We  therefore  have  three  decision  (verification)  measures: 
robust  mean-square  error,  visibility,  and  randomness.  Em¬ 
pirically,  we  set  thresholds  on  each  and  require  a  transfor¬ 
mation  to  pass  all  three  to  be  considered  verified  as  correct. 
We  require  the  alignment  error  to  be  within  6  times  the  scan¬ 
ner  error,  the  visibility  violations  to  be  at  most  20%,  and  at 


least  70%  of  the  initially  matched  keypoints  visible  in  both 
scans  to  be  consistent  with  the  final  transformation.  More 
sophisticated  techniques  could  (and  should)  be  developed, 
such  as  in  [11],  especially  when  there  could  be  substantial 
environmental  changes  between  scans. 

7.  Experiments  and  Discussion 

We  evaluate  the  three  components  of  our  location  recog¬ 
nition  system  —  keypoint  detection  and  matching,  refine¬ 
ment,  and  decision  making  —  and  validate  an  overall  sys¬ 
tem  that  combines  them.  For  the  tests  we  run  here,  the 
algorithm  attempts  to  align  scans  in  pairs.  (Generaliza¬ 
tion  to  more  than  two  scans  is  conceptually  straight-forward 
by  indexing  descriptors  to  simultaneously  recover  overlap¬ 
ping  pairs  and  the  corresponding  pairwise  transformations, 
but  this  is  beyond  the  scope  of  our  current  experiments.) 
For  each  image,  range  and  intensity  keypoints  are  both 
extracted.  For  each  pair,  matches  between  keypoints  are 
generated  and  rank-ordered  —  by  spin-image  comparison 
for  range  keypoints  and  by  the  ratio  test  for  intensity  key- 
points.  Up  to  50  range  keypoints  and  200  intensity  key- 
point  matches  are  retained.  Each  match  is  used  to  generate 
a  hypothesis,  which  is  then  run  through  the  refinement  and 
verification  procedures.  While  we  did  this  here  for  the  pur¬ 
poses  of  the  experimental  evaluation,  in  practice  we  use  a 
greedy  approach,  where  the  matches  are  tested  one-by-one 
and  the  procedure  ends  as  soon  as  a  verified  transformation 
is  found.  One  goal  of  the  experiments  is  to  show  that  this 
approach  is  viable. 

We  have  acquired  22  scans  from  various  locations  on  the 
Rensselaer  campus.  Of  these,  15  distinct  pairs  overlap,  but 
5  of  them  overlap  by  an  extremely  small  amount  (e.g.  see 
Figure  1).  We  therefore  consider  that  we  have  10  “reason¬ 
able  pairs”  and  5  “challenging  pairs”.  We  have  manually 


Figure  5.  Two  scans  (top  and  center)  for 
which  range  keypoint  matching  succeeded 
(bottom),  but  intensity  keypoint  matching 
failed.  Range  keypoints  succeeded  despite 
the  highly-repetitive  structure  of  the  building 
since  only  one  correct  match  was  needed. 


verified  transformations  for  each  of  the  15  pairs,  and  we  use 
these  as  “ground-truth”  for  testing.  We  added  another  15 
pairs  of  scans  that  do  not  overlap  to  form  an  initial  test  suite 
of  30  scan  pairs. 

The  overall  summary  result  is  that  all  10  of  the  reason¬ 
able  pairs  and  1  of  the  5  challenging  pairs  were  correctly 
registered  (Figure  4),  and  all  15  of  the  non-overlapping  pairs 
were  rejected  (none  of  the  transformations  survived  the  ver¬ 
ification  test).  By  “correctly  registered”  we  mean  that  at 
least  one  keypoint  match  was  refined  to  an  estimate  ex¬ 
tremely  close  to  the  ground  truth.  In  only  2  cases  among 
all  tests  did  an  incorrect  match  pass  the  verification  test  — 
these  were  both  for  the  reasonable  pairs  and  both  were  ex¬ 
tremely  subtle  misalignments.  The  first  involved  scans  hav¬ 
ing  substantially  different  viewpoints  and  scales;  the  sec¬ 
ond  involved  a  small  vertical  translation  error.  In  both  cases 
the  error  was  within  the  scale  of  the  intensity  keypoints,  so 
these  matches  were  counted  as  correct.  Interestingly,  for  the 
4  challenging  pairs  that  were  unmatched,  hand- selection  of 
a  single  match  followed  by  refinement  and  verification  pro¬ 
duced  a  correct  registration. 

These  preliminary  results  indicate  the  potential  of  the 
overall  approach.  More  detailed  results  on  the  individual 
components  of  the  system  are  summarized  as  follows: 


•  Intensity  keypoint  matching  produced  at  least  one  ver¬ 
ified  match  for  10  of  the  11  aligned  pairs,  while 
range  keypoint  matchings  produced  at  least  one  veri¬ 
fied  match  for  6  pairs,  but  all  of  the  ones  for  which 
at  least  one  keypoint  match  existed.  Range  keypoint 
matching  failed  for  the  other  pairs  when  there  were  no 
common  building  structures  or  the  “holes”  were  oc¬ 
cluded.  The  pair  for  which  range  keypoints  succeeded, 
but  intensity  keypoints  failed  is  shown  in  Figure  5. 
There  was  a  wide  baseline  between  scans  and  the  win¬ 
dows  are  highly  specular,  causing  the  complete  failure 
of  intensity  keypoints.  Range  keypoints  were  able  to 
succeed  despite  the  redundancy  of  the  holes. 

•  Using  the  ground-truth,  keypoint  matches  can  be  la¬ 
beled  as  approximately  correct.  For  intensity  key- 
points  among  the  10  reasonable  pairs,  there  was  a  wide 
variation  in  the  fraction  of  correct  matches,  ranging 
from  29  out  of  30  to  1  out  of  30.  Even  worse,  for 
the  one  extremely  challenging  pair,  there  was  only  one 
correct  match  in  the  top  200  and  it  was  ranked  161. 
Similar  though  less  extreme  results  were  obtained  for 
range  keypoints,  with  some  scan  pairs  having  40%  cor¬ 
rect  and  some  less  than  5%. 

•  Among  the  approximately  correct  initial  matches,  76% 
were  refined  to  produce  transformations  that  are  con¬ 
sistent  with  the  ground- truth  (L2-norm  distance  within 
a  small  multiple  of  the  sensor  noise).  The  region  grow¬ 
ing  aspect  of  the  refinement  was  generally  effective, 
especially  for  low  overlap  (see  Figure  4),  but  some¬ 
times  failed  when  initialized  in  large  planar  regions. 
Although  more  work  is  needed  to  improve  refinement, 
these  results  show  that  initializing  from  a  single  match 
is  a  viable  approach:  a  high  fraction  of  the  correct 
single  matches  are  verified,  and  when  this  verification 
fails  the  algorithm  is  likely  to  pick  and  verify  a  differ¬ 
ent  correct  match. 

•  For  the  randomness  component  of  the  verification  step, 
regeneration  of  keys  has  a  varying  impact.  In  many 
cases,  the  improvement  was  marginal,  adding  one  or 
two  correct  matches,  whereas  in  others  it  was  dramatic, 
e.g.  increasing  the  fraction  from  27%  to  56%. 

•  In  verification,  using  the  randomness  test  (with  inten¬ 
sity  keypoints)  together  the  visibility  and  the  ICP  er¬ 
ror  tests  produces  2  false  positives  (the  two  discussed 
above)  and  5  false  negatives.  The  same  results  are  ob¬ 
tained  with  randomness  and  ICP  error  alone.  ICP  er¬ 
ror  and  visibility  produces  4  false  positives  and  2  false 
negatives.  We  expect  the  advantage  of  using  the  ran¬ 
domness  test  to  increase  when  more  changes  occur  in 
the  environment  occur  between  scans. 


8.  Conclusion 
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We  have  described  a  general  version  of  the  problem  of 
registering  multiple  range  scans  as  a  location  recognition 
problem.  This  approach  includes  novel  techniques  for  key- 
point  detection  and  description,  hypothesis  generation  from 
single  keypoint  matches,  refinement  using  a  combination 
of  robust  ICP  and  region  growing,  and  a  verification  pro¬ 
cedure  that  combines  measures  of  accuracy,  visibility  and 
randomness.  On  a  preliminary  data  set  taken  of  the  Rens¬ 
selaer  campus,  the  algorithm  has  successfully  registered  all 
but  the  lowest  overlap  scan  pairs,  and  rejected  all  but  the 
most  subtly  incorrect  transformation  estimates. 

We  conclude  by  offering  three  primary  observations 
about  how  to  solve  the  local  recognition  (and  registration) 
problem  based  on  our  experiments: 

•  Both  range  and  intensity  keypoints  are  important. 
Range  keypoints  based  on  holes  and  spin  images  work 
extremely  well  on  data  sets  involving  buildings,  even 
for  varying  magnifications  and  baselines.  Intensity 
keypoints  are  effective  (a)  for  scans  involving  signifi¬ 
cant  occlusions  because  they  are  more  local  and  (b)  for 
scans  involving  natural  scenes  because  they  are  more 
generic.  Intensity  keypoints  are  less  effective  for  wide 
baselines  and  scenes  with  significant  specularities. 

•  There  is  no  need  to  cluster  matching  results.  A  greedy 
approach  that  starts  from  a  single  keypoint  match  and 
uses  a  combination  of  refinement  and  verification  pro¬ 
cedures  along  the  lines  we’ve  proposed  will  be  effec¬ 
tive.  When  there  are  many  correct  matches,  the  ap¬ 
proach  will  quickly  find  one  and  produce  a  correct 
transformation  estimate.  When  there  are  only  a  few 
correct  matches,  clustering  is  not  likely  to  be  effective. 

•  While  significant  tuning  still  must  be  done  on  our  cur¬ 
rent  refinement  and  verification  procedures,  hypothe¬ 
sis  generation  needs  the  most  future  work.  In  the  most 
difficult  examples  we’ve  tested,  a  correct  initial  corre¬ 
spondence  usually  exists,  but  it  ranks  very  low  initially. 
When  this  keypoint  is  “pulled  out”,  the  refinement  and 
verification  procedures  usually  succeed.  We  intend  to 
focus  on  both  range  keypoints  alone  and  combinations 
of  range  and  intensity  keypoints.  Handling  low  over¬ 
laps,  occlusions  and  varying  structures  will  remain  our 
primary  focus  in  this  work. 
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